FEAT: Tool Use + MCP#1811
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s for tool calling.
… into PromptTarget.send_prompt_async C4 lands the in-tree wiring for the generic tool-use loop introduced by C2/C3: - TargetCapabilities gains supports_tool_use: bool (default False) and CapabilityName.TOOL_USE for the corresponding enum value, matching the existing supports_X / "supports_X" naming convention used by every other capability. - TargetConfiguration grows tool_event_policy + tool_backend kwargs, both gettable/settable properties. The setter (and constructor) validate that a non-None tool_backend requires supports_tool_use=True; otherwise they raise ValueError immediately. ToolBackend / ToolEventPolicy imports are quoted + behind TYPE_CHECKING to keep pyrit.prompt_target.common from importing pyrit.tools eagerly. - PromptTarget.send_prompt_async picks up @tool_loop (below the existing @Final). The wrapper is a no-op when tool_event_policy is None, so every existing target keeps its current behavior. _tool_parser (property, default None) and _tool_schemas() (default []) are added on the base class as the two collaborators @tool_loop reads. - _permissive_configuration is updated to flip supports_tool_use=True alongside the other supports_X flags so the all-flags-on probe loop in test_discover_target_capabilities still sees every CapabilityName value as supported. tests/unit/tools/conftest.py drops the hand-decorated @tool_loop on _FakeToolTarget.send_prompt_async (which would now violate the base class's @Final) and instead wires policy + backend through TargetConfiguration. _tool_parser becomes a subclass property since the base class now defines one. Tests: - test_tool_event_policy.py adds U7 (capability flag wiring through the wrapper) plus dataclass field defaults and the TargetConfiguration validator. - test_prompt_target_tool_loop.py adds U1 / U2 (DB-end) / U8 / U9 / U11 exercised against a _ProductionShapedTarget that uses the real base-class _get_normalized_conversation_async (memory round-trip via patch_central_database). Plus default-_tool_parser / -_tool_schemas assertions. Validation: 8104 unit tests pass; pre-commit clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Introduces a generic, target-agnostic tool-use primitive: a new pyrit/tools/ package with a tool_loop decorator (applied to PromptTarget.send_prompt_async), ToolBackend ABC with LocalToolBackend + MCPToolBackend (stdio) implementations, MCP client/server-spec types, ToolCallParser protocol, a new TargetCapabilities.supports_tool_use flag, and TargetConfiguration.tool_event_policy / tool_backend fields. Two new exception classes (ToolCallNotSupported, ToolCallLoopLimitExceeded) carry partial-conversation state. The PR is marked DRAFT; the OpenAI target migrations described in the PR description are not yet present in the diff.
Changes:
- New
pyrit.toolspackage (ToolCall,tool_loop, backends, MCP client/specs, parsers) - Base
PromptTarget.send_prompt_asyncmade@final @tool_loop, with default no-op_tool_parser/_tool_schemas; capability + configuration fields added mcp>=1.0,<2added as a core (non-optional) dependency; new unit tests against a realFastMCPstdio subprocess
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| pyrit/tools/init.py | Public re-exports for the new tools package |
| pyrit/tools/models.py | ToolCall, ToolEventPolicy, tool_loop decorator core |
| pyrit/tools/backend.py | ToolBackend ABC with default sequential dispatch |
| pyrit/tools/local_backend.py | In-process callable backend with error envelopes |
| pyrit/tools/parsers.py | ToolCallParser protocol + canonical filter helper |
| pyrit/tools/mcp_client.py | Stdio MCPClient, three MCPServerSpec variants (only Local implemented) |
| pyrit/tools/mcp_backend.py | Multi-server routing, name-prefixing, allow-listing |
| pyrit/prompt_target/common/prompt_target.py | Apply @tool_loop to send_prompt_async; default tool hooks |
| pyrit/prompt_target/common/target_capabilities.py | New TOOL_USE capability and supports_tool_use flag |
| pyrit/prompt_target/common/target_configuration.py | New tool_event_policy / tool_backend fields + validators |
| pyrit/prompt_target/common/discover_target_capabilities.py | Permissive profile enables supports_tool_use |
| pyrit/exceptions/exception_classes.py, init.py | ToolCallNotSupported, ToolCallLoopLimitExceeded |
| pyproject.toml, uv.lock | mcp>=1.0,<2 added as a core dependency + transitive deps |
| tests/unit/tools/* | Decorator, policy wiring, local backend, MCP client/backend, real stdio echo server fixture |
…nd Response target
This commit is intentionally empty. It records a scope decision made in
response to PR review feedback. No code changes - the C5 working set was
uncommitted and has been reverted.
# Why we're dropping C5
Review feedback raised two concerns the original C5 did not address:
1. **Duplication against OpenAIResponseTarget.** The Response target
already implements an agentic tool loop (openai_response_target.py
lines 590-626), the canonical function_call envelope (lines 666-674),
a Python-callable dispatch registry (custom_functions), and an
allow-list-ish hook (fail_on_missing_function). C5 layered a parallel
implementation on top for the Chat target instead of converging both
targets onto one stack.
2. **Chat Completions is on its way out.** OpenAI has publicly framed
the Responses API as the long-term replacement for Chat Completions.
Investing in tool-call plumbing for a deprecated endpoint ages out
fast and obscures the actual value of this PR.
The right framing is: this PR is not "tool calling for all targets." It
is "pluggable tool-execution backends + a client-side agentic loop for
non-Responses-API targets." The Responses API is one transport; this PR
is the in-process abstraction that works for every transport.
# What survives unchanged
C1 (mcp SDK dep), C2 (tools/ scaffold + LocalToolBackend), C3 (MCPClient
+ MCPToolBackend + Docker stub), and C4 (capability flag + @tool_loop
wired on the base class) all remain shipped. The genuinely-novel work -
local stdio MCP, pluggable backend ABC, ToolEventPolicy (RAISE /
EXECUTE / RETURN_RAW), allowed_tools - is unaffected.
# The new design
**One agentic loop driver.** The @tool_loop decorator on
PromptTarget.send_prompt_async (shipped in C4) is the only loop driver.
Every target's _send_prompt_to_target_async returns exactly ONE Message
per call. The decorator stitches iterations into the response list.
**One tool execution layer.** Every dispatched call flows through
ToolBackend.dispatch_async(call) -> envelope. Backends (LocalToolBackend
for Python callables, MCPToolBackend for stdio MCP subprocesses, future
DockerMCPToolBackend, future CompositeToolBackend) are interchangeable
behind a single ABC.
**Migrate OpenAIResponseTarget onto the decorator (new C5).** Delete
the in-class while loop (lines 590-626). _send_prompt_to_target_async
becomes "build body, call API, parse response into one Message, return."
Add _tool_parser returning CanonicalEnvelopeParser (extracts only
function_call pieces; reasoning, mcp_call, web_search_call, etc.
continue to pass through to Memory without dispatch). Translate the
configured backend's schemas into the Responses-API tools shape inside
_construct_request_body (without clobbering an existing
extra_body_parameters["tools"]). Wrap custom_functions as a
LocalToolBackend internally with DeprecationWarning(removed_in="0.16.0"),
preserving the existing fail_on_missing_function semantics.
**Integration tests (new C6).** Rewrite to use the Response target as
the sole OpenAI tool-calling path, plus end-to-end scenario tests
against the real echo_mcp_server.
**OpenAIChatTarget receives no tool-calling support in this PR.** A
future PR can pull Chat onto the same abstractions if anyone still
wants it, but the recommended OpenAI tool-calling path becomes the
Responses API.
# Risks
* Behavior-parity on the Response target: callers that rely on
`len(send_prompt_async(...)) == iterations` rather than scanning
piece types will need updating. Existing function-chaining tests act
as sentinels.
* `custom_functions` deprecation must preserve `fail_on_missing_function`
semantics through the LocalToolBackend wrapper.
* Response parser must continue to round-trip non-`function_call` piece
types (reasoning, mcp_call, etc.) to Memory without dispatching.
* `extra_body_parameters["tools"]` takes precedence over backend-derived
tools so existing manual configs keep working.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
C6 collapses the Response target in-class agentic loop into the @tool_loop decorator shipped in C4, and routes tool dispatch through LocalToolBackend (wrapping the existing custom_functions registry as a deprecation shim). # What changed - _send_prompt_to_target_async no longer runs a while loop. It now returns exactly one Message per call. The agentic loop is driven by @tool_loop on the base class. - Added _tool_parser returning CanonicalEnvelopeParser from pyrit/tools/parsers.py. The parser extracts only function_call pieces; reasoning, mcp_call, web_search_call, computer_call, local_shell_call, etc. pass through to Memory unchanged because the parser ignores them and the decorator exits cleanly on the empty parse. - Added _tool_schemas() translating the configured backend schemas into the Responses-API tools shape. - _construct_request_body injects tools=... when the backend has schemas. User-supplied extra_body_parameters["tools"] takes precedence. - supports_tool_use=True on _DEFAULT_CONFIGURATION. - custom_functions= now emits DeprecationWarning(removed_in="0.16.0"). Internally wraps into a LocalToolBackend. A LocalToolBackend is always installed (populated or empty) so legacy target._custom_functions[name]=fn mutations keep affecting dispatch via a back-compat property. - Constructor deep-copies the class-level _DEFAULT_CONFIGURATION before mutating it (PromptTarget.get_default_configuration returns the singleton, so otherwise one instances tool_backend would leak across every other instance). # What did NOT change The legacy _find_last_pending_tool_call, _execute_call_section, and _make_tool_piece helpers remain in place. They are no longer called from production code, but existing tests still cover them; cleanup is deferred to the same follow-up PR that removes the custom_functions kwarg after the 0.16.0 deprecation window. # Tests - New tests/unit/prompt_target/target/test_openai_response_target_c6_migration.py with 7 tests covering deprecation warning, dispatch through user-supplied LocalToolBackend, schema injection, extra_body precedence, no-backend behavior, and reasoning-only passthrough. - All 5 existing function-chaining sentinel tests in test_openai_response_target_function_chaining.py pass unchanged: the back-compat _custom_functions property keeps in-place mutations working. 8131 unit tests green; pre-commit clean (ruff format, ruff check, ty). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
C7 adds end-to-end integration coverage of the @tool_loop decorator, MCPToolBackend, and MCPClient stack against the real echo_mcp_server subprocess. Only the OpenAI Responses HTTP layer is mocked; the MCP stdio subprocess, AsyncExitStack lifecycle, canonical envelope round-trip, and RedTeamingAttack execution path all run unmocked. # What ships tests/integration/tools/test_red_teaming_with_tools.py with three tests: 1. test_red_teaming_response_target_with_mcp_echo - end-to-end smoke test. RedTeamingAttack drives OpenAIResponseTarget configured with a MCPToolBackend pointing at echo_mcp_server. The Responses API mock returns one function_call followed by a stop response. Asserts the tool call actually reaches the MCP subprocess and the result lands back in the second API call as a function_call_output. 2. test_red_teaming_persists_canonical_transcript_in_memory - verifies the canonical envelope contract (plan section 13). Reads the conversation back from Memory after attack.execute_async returns and asserts the function_call and function_call_output pieces are present, in order, with matching call_ids. 3. test_red_teaming_dispatches_all_tool_calls_per_turn - regression test for the intentional behavior change from C6. The pre-C6 in-class loop in OpenAIResponseTarget only dispatched the LAST function_call per turn; the @tool_loop decorator now dispatches every call in declaration order. Issues both echo and add in one response and asserts both results land in the next API call. # Test infrastructure - LocalMCPServerSpec uses command=sys.executable + args=(echo_server,). - Mock objective scorer returns a true score so RedTeamingAttack exits cleanly after one turn. - Mock adversarial target returns a single scripted prompt wrapped as list[Message] (PromptTarget.send_prompt_async contract). - Score, ComponentIdentifier, and PromptTarget MagicMock(spec=...) usage matches the existing tests/unit/executor/attack patterns. All three integration tests pass; pre-commit clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a parser that walks text MessagePieces for marker-delimited
JSON blocks of the form {"name": ..., "arguments": {...}} and emits
canonical ToolCall instances. Marker pattern, call_id prefix, and
surrounding-text policy (truncate / extract-all / strict) are all
constructor-controlled so a single class covers angle-bracket,
pipe-delimited tag pair, and other chat-template syntaxes.
The parser is the F1 (per plan) piece that lets non-Responses-API
targets participate in PyRIT's @tool_loop without a per-vendor
parser implementation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TargetConfiguration.as_identifier_params() now snapshots the configured tool_event_policy (behavior + max_tool_iterations) and tool_backend (backend class + sorted list of advertised tool names). Two targets that differ only in their tool backend now get distinct identifiers, which downstream consumers rely on to route by target identity. Schema serialization is best-effort: backends with shape-quirky schemas that lack a recoverable 'name' field are silently dropped from the identifier surface. Exact callables and transports are not serialized because they are not deterministic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PyRIT's docs build uses MyST, not reStructuredText, so reST roles like :class:\Foo\ render as literal text in the rendered docs and mismatch the rest of the codebase. Convert all roles in the new pyrit/tools/ module to plain double-backtick code spans, and drop the in-flight commit-numbering references (C1/C2/...) that were carry-overs from the shipping plan and no longer mean anything in source. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…aises Three small cleanups in the new tools test suite: 1. Remove @pytest.mark.asyncio decorators -- the project sets asyncio_mode='auto' in pyproject.toml so the marker is a no-op that creates the appearance of opt-in async test discovery. 2. Narrow pytest.raises((AttributeError, Exception)) to dataclasses.FrozenInstanceError on the two frozen-dataclass guards in test_mcp_client.py. The previous pattern matched every Exception and would have masked unrelated regressions. 3. Drop in-flight C1/C2/.../C10 commit-id strings from test docstrings; they referenced the shipping plan, not the source tree, and read as noise after the commits land. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Cool idea; can we have a design meeting? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 31 changed files in this pull request and generated 13 comments.
Comments suppressed due to low confidence (1)
pyrit/prompt_target/openai/openai_response_target.py:23
- Deprecation warnings should go through
pyrit.common.deprecation.print_deprecation_messagerather than callingwarnings.warndirectly, to keep formatting/stacklevel consistent and filterable across the codebase.
import json
import logging
import warnings
from collections.abc import Awaitable, Callable, MutableSequence
from enum import Enum
from typing import (
Any,
Literal,
Optional,
cast,
)
from openai.types.shared import ReasoningEffort
from pyrit.common.data_url_converter import convert_local_image_to_data_url_async
from pyrit.exceptions import (
EmptyResponseException,
PyritException,
pyrit_target_retry,
)
| for _ in range(max_iter): | ||
| responses_this_turn = await self._send_prompt_to_target_async( | ||
| normalized_conversation=normalized_conversation, | ||
| ) | ||
| all_responses.extend(responses_this_turn) | ||
|
|
||
| if parser is None: | ||
| return all_responses | ||
|
|
||
| last_response = responses_this_turn[-1] |
| results = await backend.dispatch_all_sequential_async(pending_calls) | ||
| tool_msg = _build_function_call_output_message( | ||
| reference_piece=last_response.message_pieces[0], | ||
| outputs=results, | ||
| ) | ||
| all_responses.append(tool_msg) | ||
| normalized_conversation = list(normalized_conversation) + [last_response, tool_msg] | ||
|
|
| from pyrit.models.json_response_config import _JsonResponseConfig | ||
| from pyrit.prompt_target.common.target_capabilities import CapabilityName, TargetCapabilities | ||
| from pyrit.prompt_target.common.target_configuration import TargetConfiguration | ||
| from pyrit.tools import ToolCallParser, tool_loop | ||
|
|
| if custom_functions: | ||
| warnings.warn( | ||
| "OpenAIResponseTarget(custom_functions=...) is deprecated and will be " | ||
| "removed in 0.16.0. Configure tool_backend on TargetConfiguration " | ||
| "instead (e.g. LocalToolBackend(callables=..., schemas=..., " | ||
| "fail_on_missing_function=...)).", | ||
| DeprecationWarning, | ||
| stacklevel=2, | ||
| ) |
| @pytest.mark.asyncio | ||
| async def test_red_teaming_response_target_with_mcp_echo(patch_central_database): |
| class TestToolBackendDispatch: | ||
| """The modern path: pass tool_backend via TargetConfiguration.""" | ||
|
|
||
| @pytest.mark.asyncio |
| class TestToolSchemasInjection: | ||
| """_construct_request_body injects backend schemas when present.""" | ||
|
|
||
| @pytest.mark.asyncio |
| assert body["tools"][0]["type"] == "function" | ||
| assert body["tools"][0]["name"] == "get_weather" | ||
|
|
||
| @pytest.mark.asyncio |
| ) | ||
| assert body["tools"] == legacy | ||
|
|
||
| @pytest.mark.asyncio |
| must therefore see an empty parse and exit cleanly. | ||
| """ | ||
|
|
||
| @pytest.mark.asyncio |
…all_output pieces ChatMessageNormalizer raised on function_call / function_call_output data types, which meant any target whose wire format runs through it (AzureMLChatTarget, HuggingFaceChatTarget, OpenAIChatTarget) could not round-trip a tool-call conversation through @tool_loop. Adds a per-message tool-message detector that converts function_call pieces to an assistant message with content=null and a ToolCall populated from the canonical envelope, and function_call_output pieces to a role=tool message with tool_call_id set from the envelope's call_id and content set to the output. Matches the OpenAI Chat Completions wire shape. Also fixes ChatMessage.ToolCall whose 'function' field was typed as a bare string; OpenAI ships it as a nested object with name + arguments. ChatMessage.content now permits None for assistant messages that carry only tool_calls (the OpenAI API requires content=null in that shape). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The base default for _tool_schemas() now reads self.configuration.tool_backend.schemas verbatim. Subclasses that need wire-format wrapping (currently only OpenAIResponseTarget, which prepends type=function) override the method and reuse the base via super() to get the raw schemas. Removes a small but real duplication risk for the upcoming AzureMLChatTarget / HuggingFaceChatTarget tool-calling paths, which would otherwise each reimplement the 'read schemas from configured backend' boilerplate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AzureMLChatTarget now participates in PyRIT's @tool_loop when callers supply a ToolCallParser at construction. The parser flips supports_tool_use=True on the default capabilities so callers don't need to construct a custom_configuration just to opt in. A convenience tool_backend kwarg installs the backend onto the configuration in one step. Wire format: _tool_schemas() wraps the backend's schemas in the OpenAI Chat Completions tools shape (with each schema nested under a "function" key). _construct_http_body_async injects the wrapped schemas as a top-level tools field when non-empty. Deployments unwrap that envelope before passing to tokenizer.apply_chat_template; see plan section 12.9 for the contract. Response handling: _complete_chat_async now returns the parsed JSON body (was: string output). The new _materialize_response walks the response dict and emits one text MessagePiece for the output field plus one function_call MessagePiece per envelope in the tool_calls field; CanonicalEnvelopeParser then finds those pieces in the loop's next iteration. The no-tools path is unchanged: requests without tool_parser produce byte-identical request bodies, verified by test_request_body_omits_tools_key_when_no_backend. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Same shape as the AzureMLChatTarget F2 change: callers supply a
ToolCallParser at construction; the parser flips
supports_tool_use=True on the default capabilities so no
custom_configuration is required to opt in. A convenience tool_backend
kwarg installs the backend onto the configuration in one step.
Wire format differs from AzureML because HuggingFace runs the model
in-process via the transformers library:
* _tool_schemas() returns the bare backend schemas (no OpenAI
envelope) because tokenizer.apply_chat_template expects bare
function schemas, not the Chat Completions wrapper.
* _apply_chat_template forwards tools= into apply_chat_template
when schemas are present; the model's tool-trained chat template
renders the model-family-specific tools block (Qwen wraps in
<tools>...</tools>, Llama uses a system-message preamble, etc.).
* _build_chat_messages now walks every piece in each message and
converts function_call / function_call_output envelopes to the
chat-template tool message shape (assistant + tool_calls list,
role=tool + tool_call_id) so the model sees the canonical
in-context tool conversation.
The no-tools path is unchanged: without tool_parser, no tools key is
passed to apply_chat_template and no tool message translation runs.
The user-supplied tool_parser walks the response text for inline
tool-call markers; InlineToolCallParser is the typical choice for
ChatML-style angle-bracket markers, but the user can supply any
ToolCallParser implementation (different marker regex, different
mode).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds tests/integration/tools/test_azure_ml_with_tools_integration.py exercising the full PyRIT @tool_loop stack against AzureMLChatTarget with only the HTTP layer mocked. The mocked responses match the §12.9.2 canonical envelope shape: first response carries a tool_calls field that the loop dispatches via LocalToolBackend; second response is the final assistant text. Asserts the canonical four-piece transcript shape persists in Memory: [user text, assistant function_call, tool function_call_output, assistant text], with the call_id round-tripping between the assistant function_call piece and the tool function_call_output piece, and the tool output reflecting the actual dispatched callable's return value. Also covers the no-tools backward-compatibility path: a target constructed without tool_parser produces a request body that has no tools key, proving the F2 changes do not regress existing AzureML deployments that don't carry the patched scoring script. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous cleanup commit (31ed2fb) removed the pyrit/tools/ package and tests/unit/tools/ directory, but several tool-calling changes from PR microsoft#1811 (MCP) remained mixed in: - pyrit/exceptions: ToolCallNotSupported and ToolCallLoopLimitExceeded - pyrit/prompt_target/common/: @tool_loop decoration on send_prompt_async, supports_tool_use capability, tool_event_policy and tool_backend slots on TargetConfiguration - pyrit/prompt_target/openai/openai_response_target.py: migration onto @tool_loop + LocalToolBackend (the in-class agentic loop was removed in favor of the decorator) - tests/integration/tools/ and tests/unit/prompt_target/target/test_openai_response_target_c6_migration.py - pyproject.toml + uv.lock: mcp Python SDK dependency All of the above are reverted to origin/main. The adversarial benchmark refactor (this PR's actual scope) is unaffected; 128 targeted unit tests across openai_response_target, function_chaining, and scenario/benchmark still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
PyRIT's existing tool-calling story is fragmented:
OpenAIChatTargetparsestool_callsintofunction_callpieces and stops — no execution, no loop.OpenAIResponseTargethand-rolls a complete agentic loop inside_send_prompt_to_target_async, accepts acustom_functionsregistry of Python callables, and dispatches one tool call per turn.This PR introduces a single, target-agnostic tool-use primitive that any
PromptTargetsubclass can opt into:pyrit/tools/package with atool_loopdecorator wired ontoPromptTarget.send_prompt_async, aToolCallParserprotocol (per-target detection), and aToolBackendABC with two concrete backends —LocalToolBackend(in-process Python callables) andMCPToolBackend(stdio MCP servers via the officialmcpSDK).TargetCapabilities.supports_tool_usecapability flag plusToolEventPolicy(EXECUTE/RAISE/RETURN_RAW) and atool_backendslot onTargetConfiguration. The policy lets red-team callers observe attempted tool use without executing it, or hand the raw response back untouched.OpenAIResponseTargetmigrated onto the decorator. The in-class agentic loop is gone;_send_prompt_to_target_asyncreturns exactly oneMessageper call and the decorator stitches multi-turn iterations into the response list. Multiple tool calls in a single turn are now dispatched all-at-once sequentially — the protocol-intended behavior.InlineToolCallParserthat walks text pieces for marker-delimited JSON blocks (configurable regex; defaults to angle-bracket syntax). Non-OpenAI deployments that emit tool calls inline in generated text can opt in by supplying this as_tool_parser.AzureMLChatTargetandHuggingFaceChatTargetgain optionaltool_parserandtool_backendconstructor kwargs that opt them into the decorator without subclassing. Supplying a parser flipssupports_tool_use=Trueon the default capabilities so callers don't need acustom_configurationjust to enable tool use. The two targets use different wire-format wrappings (AzureMLChatTargetwraps schemas in the OpenAI Chat Completions{"type":"function","function":{...}}envelope;HuggingFaceChatTargetpasses bare schemas straight intotokenizer.apply_chat_template).ChatMessageNormalizernow serializesfunction_callandfunction_call_outputpieces into the OpenAI Chat Completions wire shape (assistant message withtool_calls;role="tool"message withtool_call_id). This is what makes the chat-completions-shaped targets above able to round-trip tool conversations through@tool_loopwithout target-side translation code.custom_functionskwarg onOpenAIResponseTargetis deprecated (removed_in="0.16.0"); internally rewrapped as aLocalToolBackendso the legacy path keeps working through one release cycle.OpenAIChatTargetis intentionally left as-is. The Responses API is the modern agentic surface for OpenAI; new tool-calling investment there would age poorly. Targets that need tool calling for non-Responses-API endpoints opt into the decorator by supplying a parser and a backend.Future MCP transports (HTTP/SSE, Docker sandbox), additional sandbox providers, and streaming all plug in behind the existing
ToolBackend/MCPServerSpecinterfaces with no abstraction changes. TheMCPServerSpecunion ships with three variants:LocalMCPServerSpec(the only one with a working transport) plus stub declarations ofRemoteMCPServerSpecandDockerMCPServerSpecwhoseconnect_asyncraisesNotImplementedError. Future PRs implement an already-declared variant rather than expanding the union.Tracks deferred work via TODOs marked
# TODO(streaming-v2),# TODO(mcp-http-transport),# TODO(mcp-resources), and# TODO(sandbox-provider).Compatibility
This PR is not breaking for the standard tool-calling path. Compatibility caveats reviewers should know about:
PromptTarget.send_prompt_asyncis@final. External subclasses that override the public entrypoint (not just_send_prompt_to_target_async) will fail to import. No in-tree target overrides it today.OpenAIResponseTarget(custom_functions=...). The kwarg now emitsDeprecationWarning(removed_in="0.16.0")and is internally rewrapped as aLocalToolBackend. No runtime behavior change in the current release cycle.OpenAIResponseTarget._find_last_pending_tool_call,_execute_call_section, and_make_tool_pieceare no longer called from production code. Listed for changelog completeness — these were always private.Tests and Documentation
tests/unit/tools/directory covering the decorator, parsing,LocalToolBackend,MCPClient(real stdio subprocess against a deterministicFastMCPfixture), andMCPToolBackend(multi-server routing, name-collision detection,name_prefixdisambiguation,allowed_toolsfiltering, and concurrent-dispatch serialization).tests/unit/prompt_target/common/test_prompt_target_tool_loop.pyasserting decorator order-of-execution against a fake target and usingpatch_central_databaseto verify per-message insert ordering, per-role labeling (assistant,tool), and per-data-type labeling (function_call,function_call_output) against the actual DB schema.tests/unit/prompt_target/target/test_openai_response_target_c6_migration.pycovering the migration onto@tool_loop, the deprecation warning oncustom_functions, schema injection into request bodies,extra_body_parameters["tools"]precedence, and multi-call-per-turn sequential dispatch.test_openai_response_target_function_chaining.pysentinel tests pass unchanged: the back-compat property on_custom_functionskeeps in-place mutations working.tests/unit/tools/test_inline_parser.pycoveringInlineToolCallParseracross marker syntaxes (angle-bracket, pipe-delimited tag pair, square-bracket list payload), mode coverage (TRUNCATE_AT_LAST/TRUNCATE_AT_FIRST/EXTRACT_ALL/STRICT_TRAILING_EMPTY), and edge cases (empty input, malformed JSON, missingnamefield, multi-piece messages).tests/unit/prompt_target/target/test_azure_ml_chat_target.pyandtests/unit/prompt_target/target/test_huggingface_chat_target.pywith coverage for the newtool_parser/tool_backendkwargs: capability flipping, backend installation, request-body shape, no-tools backward compatibility, and (for the AzureML side) response materialization intofunction_callpieces.tests/unit/message_normalizer/test_chat_message_normalizer.pywith full round-trip coverage of tool-piece serialization (function_call→ assistanttool_calls,function_call_output→role=toolwithtool_call_id).tests/integration/tools/test_red_teaming_with_tools.pyrunning the realRedTeamingAttackagainstOpenAIResponseTargetwith only the HTTP layer mocked. Tools are served by the realecho_mcp_serversubprocess; the MCP stdio subprocess,AsyncExitStacklifecycle, canonical envelope round-trip, andRedTeamingAttackexecution path all run unmocked.tests/integration/tools/test_azure_ml_with_tools_integration.pyexercising the full PyRIT@tool_loopstack againstAzureMLChatTargetwith only the HTTP layer mocked. Asserts the canonical four-piece transcript (user → assistant function_call → tool function_call_output → assistant text) lands in Memory with matchingcall_idround-tripping.JupyText: not applicable (no notebook changes).